[contents] [prev] [next] [top] [bottom] (2 out of 12)

Tokens and Literals

Tokens are the basic building blocks that make up expressions and other constructs in a programming language. The ScriptX bytecode compiler interprets a stream of input characters, which it can pass through only once. As it reads these characters, the compiler interprets them, one at a time. A token is a sequence of one or more input characters that the compiler recognizes and understands as having some meaning. The ScriptX language has the following kinds of tokens: symbols, operators, reserved words, punctuation marks, and literals.

Literals are sequences of characters that are a literal representation of instances of one of a special set of classes. The value of a literal is the object it represents. Literals are tokens in the input stream that the bytecode compiler recognizes as objects. ScriptX provides string literals, name literals, numeric constants, and several kinds of collections and ranges as literal objects.

Symbols

A symbol must begin with an alpha character, and it can be up to 256 characters in length. The underscore character, which can be used interchangeably with alpha characters, is the only non-alphanumeric character that is permitted in a symbol. Symbols cannot contain blanks or other separators. Although case is remembered when a symbol is first encountered, and is saved for subsequent printing, ScriptX is not case sensitive.

symbol	::=	initialChar [ trailingChar ]* 
initialChar	::=	alphaChar | underscore 
trailingChar	::=	alphaChar | underscore | decimalDigit 
alphaChar	::=	a | b | c | . . . | x | y | z 
decimalDigit	::=	0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 
underscore	::=	_ 

A symbol is a lexical name, a name that is understood by the compiler. The compiler associates a symbol at compile time with the thing in a program that it names, such as a variable, constant, or function.

Operators

All ScriptX operations are performed on operands that are objects, and the result of any operation is also an object. Many operators actually serve as a shorthand for functions or generics that are defined by the ScriptX object system. For example, the RootObject class provides each object with a default version of the methods isComparable, localLT, localEqual, and eq. These four generic functions are visible to the scripter. A class can override these generics to provide its own version of the Comparison protocol. In this way, the number and string classes have different definitions of equality.

Table A-2 indicates the precedence and associativity of operators in the ScriptX language. In this table, precedence is ordered from highest to lowest.

Table A-2: Precedence and associativity of ScriptX operators

Operator or Token

Associativity

element access in collections ( [ ] )

left to right

class and instance variable access ( . , ' )

left to right

coercion ( as )

left to right

negation ( - )

left to right

multiplication and division ( *, / )

left to right

addition and subtraction ( +, - )

left to right

=, >, <, !=, ==, !==, >=, <=, <>, contains

left to right

not

left to right

and

left to right

or

left to right

select

left to right

where, before, after, from, to, through

left to right

in, by, using

right to left

pipe operator ( | )

left to right

repeat, guard, exit, for, return

left to right

thread operator ( & )

left to right

assignment operator ( := )

right to left

ScriptX expression

left to right

Most ScriptX operators can be used with or without white space. Careful use of white space, which is defined on page 233, can make a program more readable. Certain operators require a separator to prevent ambiguity. Operators that are reserved words, such as contains or as, must be set off by a separator, normally a blank. The subtraction operator requires a separator, normally a blank, so that the compiler can distinguish it from the negation operator.

Reserved Words

Reserved words are words that cannot, with one exception, be used as names of things because they have special syntactic meanings in ScriptX. (The exception is that reserved words can be used as keywords, followed by a colon, in keyword argument definitions, which are defined on page 248 of this grammar.) Table A-2 contains a list of reserved words, including several that are not currently in use in the ScriptX grammar, but are reserved for future use. Throughout this appendix, reserved words are indicated in the Courier typeface.

Table A-3: ScriptX reserved words

#key

first

my

#rest

fn

named

actual

for

nextMethod

after

from

not

and

function

object

any

get

of

anything

global

on

as

guard

or

before

if

otherwise

by

imports

prefix

case

in

readonly

catching

inclusive

reference

class

index

renames

class method

initializer

repeat

class methods

inst

return

class variables

inst methods

select

class vars

inst variables

set

collect

inst vars

settings

constant

instance

then

contains

instance methods

through

contents

instance variables

throw

continue

instance vars

throw again

continuous

into

times

do

it

to

else

its

transient

end

kind

unglobal

every

last

until

everything

local

uses

excludes

macro

using

exclusive

method

where

exit

middle

while

exports

module

with

Punctuation Marks and Meta-Characters

Separators and terminators are often classified together as a single kind of token, as punctuation. Delimiters and lead-in characters are meta-characters that the compiler uses to identify other tokens. Punctuation marks and meta-characters have no meaning in and of themselves, except that they set off or identify other tokens. A separator indicates the end of one token and the beginning of the next. A terminator indicates the end of a complete grammatical construct, such as an expression or a clause within an expression. A delimiter or a lead-in character indicates that some group of characters it is associated with in the input stream represents a particular kind of token.

In this grammar, those punctuation marks that have a visual printed representation, such as the comma or colon, are depicted in literal form, in the same typeface and style as reserved words. Nonprinting punctuation marks, such as the end-of-line character, are indicated in certain cases.

White space is a concept that goes hand-in-hand with punctuation. Any input characters that the compiler ignores are called white space. ScriptX has two kinds of white space. First, there are comments. ScriptX identifies a sequence of characters that begins with two hyphens ( -- ) or two slashes ( // ) and ends with a new line or carriage return as a comment. A comment can be inserted in the middle of an expression; interpretation of the expression will resume on the next line. ScriptX also accepts C-style inset comments, using the same delimiters. ScriptX, in contrast with ANSI C, allows inset comments to be nested.

A second kind of white space is blank space. Blank space consists of optional space, tab, and end-of-line characters that can be inserted between tokens to make code more readable. Certain punctuation marks can serve as white space, while others cannot. If a punctuation mark can serve as white space, then two or more are permitted wherever one is permitted. Table A-4 lists punctuation marks in the ScriptX language, identifying whether or not they can be used as white space.

Table A-4: Punctuation marks in ScriptX

Punctuation

White Space

Purpose

blank

yes

separator

comma ( , )

no

separator in some lists

colon ( : )

no

separator in some paired elements

newline

yes

separator in incomplete expressions

terminator in complete expressions

carriage return

yes

separator in incomplete expressions

terminator in complete expressions

semicolon ( ; )

no

terminator (newline)

stop ( !! )

no

terminator that halts evaluation of expressions in the input stream

quotes ( " " )

no

delimiter for string literals

@-sign

no

lead-in character for name literals

hash sign ( # )

no

lead-in character for array and keyed list literals

function ( -> )

no

lead-in character for function body

backslash ( \ )

no

separator that also turns the next new line or carriage return in the input stream into white space

lead-in character for escape characters

two hyphens ( -- )

yes

lead-in character for a comment

two slashes ( // )

yes

lead-in character for a comment

comment ( /* */ )

yes

delimiters for ANSI C style inset comments, which can be nested

parentheses

no

delimiters for certain kinds of lists, anonymous functions, compound expressions; often required as a separator to insure that an expression is parsed

brackets

no

delimiters for access to members of collections

braces

no

delimiters for restrictions

angle brackets < >

no

delimiters for hexadecimal constants used to represent unicode characters

(also used as comparison operators)

In most cases, this grammar does not explicitly indicate where end-of-line and white space characters are allowed, except where an end-of-line character is required as a separator. An incomplete sentence can be broken with a newline character; evaluation of the expression resumes on the following line. ScriptX programmers commonly break lines after a binary operator, after a separator that cannot act as white space, or anywhere where the compiler is expecting a closing delimiter such as a parenthesis.

Parentheses serve as separators as well as delimiters in expressions that could otherwise not be parsed. The following examples demonstrate how parentheses can be used to turn an expression into a factor. Factors, indivisible syntactic units, are discussed in the section "Types of Expression" on page 236.

function doIt a b -> (
	print a; print b
)

doIit -1 -2
-1
-2

doIt -(1) -(2)
no sub instance method

doIt negate(1)  negate(2)
too many arguments 4 supplied 2 allowed

doIt (negate(1))  (negate(2))
-1
-2

As these examples demonstrate, ScriptX allows lists of arguments with minimal punctuation. If arguments are not separated by commas, then the arguments themselves must be factors. By enclosing an expression that is used as an argument within parentheses, it becomes a factor, a complete and indivisible unit of punctuation.

ScriptX punctuation allows for a variety of programming styles; its flexibility accounts for the fluidity of ScriptX code. Punctuation usually causes few difficulties, even for beginning scripters. For more information on punctuation, see the discussion that begins on page 29 of this volume.

Literals

Name literals represent instances of the class NameClass. Any valid name that is preceded by an at sign ( @ ) is interned. Name literals are full-fledged NameClass objects that have the same value at compile time and run time. They are used as labels in programs, often to represent a state or outcome. Two name literals that have the same value are the same object. That is, the names @insideOut, @insideout, and @INSIDEOUT are not merely equal, they are actually the same object.

nameLiteral	::=	@ [ trailingChar ]+ 
trailingChar	::=	alphaChar | underscore | decimalDigit 

A string literal is a sequence of Unicode characters, enclosed in double quotes in the input stream. A string literal can extend over multiple lines, can be any length, and can include any valid Unicode character, including newline. To use certain nonprinting characters in a string, escape characters are required.

stringLiteral	::=	" [ unicodeChar ]* " 
unicodeChar	::=	-- any printing char 
	|	escapeChar 
	|	\< hexConst > 
escapeChar	::=	\n | \r | \t 
hexDigit	::=	0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f 
hexConst	::=	[ hexDigit ]+ 

The compiler automatically stores a string literal as a StringConstant object. To modify or edit a string literal, it must first be converted to the String or Text class. Although name literals and symbols are not case sensitive in ScriptX, strings are. For more information on strings, see the "Text and Fonts" chapter of ScriptX Components Guide.

A numeric constant is automatically stored as an instance of one of the subclasses of Number: ImmediateInteger, LargeInteger, ImmediateFloat, or Float, depending on range and precision requirements. An integer constant is stored as an ImmediateInteger object, except when its value extends beyond the 29-bit storage range of the class. ScriptX has no unsigned data types. A floating point constant is converted to either an ImmediateFloat or a Float object depending on both range and precision requirements. For more information, see the "Numerics" chapter of ScriptX Components Guide.

numericConst	::=	mantissa [ exponent ] 
	|	hexLiteral 
mantissa	::=	integerConst [ .decimalDigit ] [ decimalDigit ]* 
exponent	::=	e integerConst 
integerConst	::=	[ negOperator ] [ decimalDigit ]+ 
negOperator	::=	- 
decimalDigit	::=	0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 
hexLiteral	::=	0x [ hexDigit ]+ 

The ScriptX language provides several other literal constructions. A literal construction is an expression that creates a new instance of a class directly. Normally, you create a new class either by calling the new method or by using the object expression. ScriptX creates instances of Array, KeyedLinkedList, ContinuousNumberRange, and NumberRange from literal expressions.

arrayLiteral	::=	#( exprList ) 
	|	#( keyedList ) 
	|	#( ) 
	|	#( : ) 
exprList	::=	simpleExpr [ , exprList ]* 
keyedList	::=	factor : simpleExpr [ , keyedList ]* 

rangeLiteral	::=	factor to factor [ by factor ] 
	|	factor by factor [ by factor ] 
	|	factor to factor [ inclOption ] continuous 
	|	factor inclOption to factor [ inclOption ] continuous 
inclOption	::=	inclusive | exclusive 

These collections and ranges can also be instantiated by normal means, by calling new on the appropriate class or by using an object definition expression.


This document is part of the ScriptX Language Guide, one of the volumes of the ScriptX Technical Reference Series. ScriptX is developed by the ScriptX Engineering Team at Apple Computer, successor to the Kaleida Engineering Team at Kaleida Labs, Inc.

Copyright 1996 Apple Computer, Inc. All Rights Reserved.